-
Notifications
You must be signed in to change notification settings - Fork 59
Refactor startup command to wait for node IP changes #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: galal-hussein <[email protected]>
Signed-off-by: galal-hussein <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #598 +/- ##
==========================================
- Coverage 59.02% 56.01% -3.02%
==========================================
Files 56 55 -1
Lines 5316 5297 -19
==========================================
- Hits 3138 2967 -171
- Misses 1893 2035 +142
- Partials 285 295 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: galal-hussein <[email protected]>
Signed-off-by: galal-hussein <[email protected]>
enrichman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor nits, but looks good, thanks!
Signed-off-by: galal-hussein <[email protected]>
enrichman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! 👏
This PR adjusts the controller to update the node's address when the server pod restarts, this prevents issues like network policy controller starting and crashing due to changes in node IPThe PR refactors the startup command, now the startup command is divided into 3 functions:
1- single_server()
This simply when k3k starts a cluster with only one node, this has to be specially handled because if the server gets its IP changes (i.e pod deleted and restarted) then cluster quorum has to be resetted.
2- ha_server()
This is when a server starts in HA mode, and we check if its the first server to start then we start the server with init config, and if its other servers then we start with normal server config
3- safe_mode()
If the pod changes IP (server pod gets recreated) and since some services (like network policy controllers) requires the node IP to be correct see k3s-io/k3s#12844, then k3k needs to handle these situation, the current safe mode disables the network policy controller and starts a temporary server until the node IP is corrected by kubelet, once corrected we exit the safe mode gracefully allowing the pod to start normally.
Logs
These are example logs from different scenarios:
1- single server cluster pod being deleted and recreated:
2- HA 1st server pod gets deleted and recreated
3- HA second or third server pod gets deleted and recreated